Goto

Collaborating Authors

 ais data


Federated Learning and Trajectory Compression for Enhanced AIS Coverage

Gräupl, Thomas, Reisenbauer, Andreas, Hecko, Marcel, Rasouli, Anil, Graser, Anita, Dragaschnig, Melitta, Weissenfeld, Axel, Dejaegere, Gilles, Sakr, Mahmoud

arXiv.org Artificial Intelligence

Abstract--This paper presents the V esselEdge system, which leverages federated learning and bandwidth-constrained trajectory compression to enhance maritime situational awareness by extending AIS coverage. V esselEdge transforms vessels into mobile sensors, enabling real-time anomaly detection and efficient data transmission over low-bandwidth connections. The system integrates the M fed model for federated learning and the BWC-DR-A algorithm for trajectory compression, prioritizing anomalous data. Preliminary results demonstrate the effectiveness of V esselEdge in improving AIS coverage and situational awareness using historical data. The Automatic Identification System (AIS) is a tracking system that uses transceivers on ships to monitor marine traffic.


MH-GIN: Multi-scale Heterogeneous Graph-based Imputation Network for AIS Data (Extended Version)

Liu, Hengyu, Li, Tianyi, He, Yuqiang, Torp, Kristian, Li, Yushuai, Jensen, Christian S.

arXiv.org Artificial Intelligence

Location-tracking data from the Automatic Identification System, much of which is publicly available, plays a key role in a range of maritime safety and monitoring applications. However, the data suffers from missing values that hamper downstream applications. Imputing the missing values is challenging because the values of different heterogeneous attributes are updated at diverse rates, resulting in the occurrence of multi-scale dependencies among attributes. Existing imputation methods that assume similar update rates across attributes are unable to capture and exploit such dependencies, limiting their imputation accuracy. We propose MH-GIN, a Multi-scale Heterogeneous Graph-based Imputation Network that aims improve imputation accuracy by capturing multi-scale dependencies. Specifically, MH-GIN first extracts multi-scale temporal features for each attribute while preserving their intrinsic heterogeneous characteristics. Then, it constructs a multi-scale heterogeneous graph to explicitly model dependencies between heterogeneous attributes to enable more accurate imputation of missing values through graph propagation. Experimental results on two real-world datasets find that MH-GIN is capable of an average 57% reduction in imputation errors compared to state-of-the-art methods, while maintaining computational efficiency. The source code and implementation details of MH-GIN are publicly available https://github.com/hyLiu1994/MH-GIN.


Predicting Barge Tow Size on Inland Waterways Using Vessel Trajectory Derived Features: Proof of Concept

Agorku, Geoffery, Hernandez, Sarah, Hames, Hayley, Wagner, Cade

arXiv.org Artificial Intelligence

Accurate, real-time estimation of barge quantity on inland waterways remains a critical challenge due to the non-self-propelled nature of barges and the limitations of existing monitoring systems. This study introduces a novel method to use Automatic Identification System (AIS) vessel tracking data to predict the number of barges in tow using Machine Learning (ML). To train and test the model, barge instances were manually annotated from satellite scenes across the Lower Mississippi River. Labeled images were matched to AIS vessel tracks using a spatiotemporal matching procedure. A comprehensive set of 30 AIS-derived features capturing vessel geometry, dynamic movement, and trajectory patterns were created and evaluated using Recursive Feature Elimination (RFE) to identify the most predictive variables. Six regression models, including ensemble, kernel-based, and generalized linear approaches, were trained and evaluated. The Poisson Regressor model yielded the best performance, achieving a Mean Absolute Error (MAE) of 1.92 barges using 12 of the 30 features. The feature importance analysis revealed that metrics capturing vessel maneuverability such as course entropy, speed variability and trip length were most predictive of barge count. The proposed approach provides a scalable, readily implementable method for enhancing Maritime Domain Awareness (MDA), with strong potential applications in lock scheduling, port management, and freight planning. Future work will expand the proof of concept presented here to explore model transferability to other inland rivers with differing operational and environmental conditions.


Machine Learning-Based Classification of Vessel Types in Straits Using AIS Tracks

Nielsen, Jonatan Katz

arXiv.org Artificial Intelligence

Accurate recognition of vessel types from Automatic Identification System (AIS) tracks is essential for safety oversight and combating illegal, unreported, and unregulated (IUU) activity. This paper presents a strait-scale, machine-learning pipeline that classifies moving vessels using only AIS data. We analyze eight days of historical AIS from the Danish Maritime Authority covering the Bornholm Strait in the Baltic Sea (January 22-30, 2025). After forward/backward filling voyage records, removing kinematic and geospatial outliers, and segmenting per-MMSI tracks while excluding stationary periods ($\ge 1$ h), we derive 31 trajectory-level features spanning kinematics (e.g., SOG statistics), temporal, geospatial (Haversine distances, spans), and ship-shape attributes computed from AIS A/B/C/D reference points (length, width, aspect ratio, bridge-position ratio). To avoid leakage, we perform grouped train/test splits by MMSI and use stratified 5-fold cross-validation. Across five classes (cargo, tanker, passenger, high-speed craft, fishing; N=1{,}910 trajectories; test=382), tree-based models dominate: a Random Forest with SMOTE attains 92.15% accuracy (macro-precision 94.11%, macro-recall 92.51%, macro-F1 93.27%) on the held-out test set, while a tuned RF reaches one-vs-rest ROC-AUC up to 0.9897. Feature-importance analysis highlights the bridge-position ratio and maximum SOG as the most discriminative signals; principal errors occur between cargo and tanker, reflecting similar transit behavior. We demonstrate operational value by backfilling missing ship types on unseen data and discuss improvements such as DBSCAN based trip segmentation and gradient-boosted ensembles to handle frequent-stop ferries and further lift performance. The results show that lightweight features over AIS trajectories enable real-time vessel type classification in straits.


Multi-Model Synthetic Training for Mission-Critical Small Language Models

Platt, Nolan, Nayak, Pragyansmita

arXiv.org Artificial Intelligence

Abstract--Large Language Models (LLMs) have demonstrated remarkable capabilities across many domains, yet their application to specialized fields remains constrained by the scarcity and complexity of domain-specific training data. We present a novel approach that achieves a 261x cost reduction for maritime intelligence by using LLMs as one-time teachers rather than using them directly for inference. Our method transforms 3.2 billion Automatic Identification System (AIS) vessel tracking records into 21,543 synthetic question and answer pairs through multi-model generation (GPT -4o and o3-mini), preventing over-fitting and ensuring accurate reasoning. We show that smaller, cheaper models - when fine tuned properly - can provide similar accuracy compared to larger models that are prohibitively expensive. Our work contributes to the growing field of synthetic dataset generation for specialized AI applications and presents a highly reproducible framework for domains where manual annotation is infeasible. Beyond expanding research in the growing field of specialized small language models, our approach has immediate applications in maritime safety, security operations, and vessel traffic management systems in various industries. In recent years, Large Language Models (LLMs) have proven successful across diverse natural language tasks, but their usage for specialized domains faces a large challenge: the cost of continuous LLM inference, often reaching thousands of dollars per day for real-time systems [1].


Using LLMs for Analyzing AIS Data

Merten, Gaspard, Dejaegere, Gilles, Sakr, Mahmoud

arXiv.org Artificial Intelligence

Data Science and Engineering Lab Universit e libre de Bruxelles Brussels, Belgium gaspard.merten@ulb.be Data Science and Engineering Lab Universit e libre de Bruxelles Brussels, Belgium gilles.dejaegere@ulb.be Data Science and Engineering Lab Universit e libre de Bruxelles Brussels, Belgium mahmoud.sakr@ulb.be Abstract --Recent research in Large Language Models (LLMs), has had a profound impact across various fields, including mobility data science. This paper explores the and experiment with different approaches to using LLMs for analyzing AIS data. We propose a set of carefully designed queries to assess the reasoning capabilities of LLMs in this kind of tasks. Further, we experiment with four different methods: (1) using LLMs as a natural language interface to a spatial database, (2) reasoning on raw data, (3) reasoning on compressed trajectories, and (4) reasoning on semantic trajectories. We investigate the strengths and weaknesses for the four methods, and discuss the findings. The goal is to provide valuable insights for both researchers and practitioners on selecting the most appropriate LLM-based method depending on their specific data analysis objectives. The significant development in artificial machine learning has also opened the way to new approaches to solve real-world geospatial problems. In particular, Large Language Models (LLMs) have emerged as powerful tools for understanding and generating human-like text. These models have demonstrated remarkable abilities in natural language processing tasks, from answering complex queries to summarizing and interpreting information in various domains. This exponential increase of LLMs usage can also be witnessed in the domain of Geographic Information Systems (GIS) in recent years.


Fusing Monocular RGB Images with AIS Data to Create a 6D Pose Estimation Dataset for Marine Vessels

Holst, Fabian, Gülsoylu, Emre, Frintrop, Simone

arXiv.org Artificial Intelligence

The paper presents a novel technique for creating a 6D pose estimation dataset for marine vessels by fusing monocular RGB images with Automatic Identification System (AIS) data. The proposed technique addresses the limitations of relying purely on AIS for location information, caused by issues like equipment reliability, data manipulation, and transmission delays. By combining vessel detections from monocular RGB images, obtained using an object detection network (YOLOX-X), with AIS messages, the technique generates 3D bounding boxes that represent the vessels' 6D poses, i.e. spatial and rotational dimensions. The paper evaluates different object detection models to locate vessels in image space. We also compare two transformation methods (homography and Perspective-n-Point) for aligning AIS data with image coordinates. The results of our work demonstrate that the Perspective-n-Point (PnP) method achieves a significantly lower projection error compared to homography-based approaches used before, and the YOLOX-X model achieves a mean Average Precision (mAP) of 0.80 at an Intersection over Union (IoU) threshold of 0.5 for relevant vessel classes. We show indication that our approach allows the creation of a 6D pose estimation dataset without needing manual annotation. Additionally, we introduce the Boats on Nordelbe Kehrwieder (BONK-pose), a publicly available dataset comprising 3753 images with 3D bounding box annotations for pose estimation, created by our data fusion approach. This dataset can be used for training and evaluating 6D pose estimation networks. In addition we introduce a set of 1000 images with 2D bounding box annotations for ship detection from the same scene.


AIS-LLM: A Unified Framework for Maritime Trajectory Prediction, Anomaly Detection, and Collision Risk Assessment with Explainable Forecasting

Park, Hyobin, Jung, Jinwook, Seo, Minseok, Choi, Hyunsoo, Cho, Deukjae, Park, Sekil, Choi, Dong-Geol

arXiv.org Artificial Intelligence

With the increase in maritime traffic and the mandatory implementation of the Automatic Identification System (AIS), the importance and diversity of maritime traffic analysis tasks based on AIS data, such as vessel trajectory prediction, anomaly detection, and collision risk assessment, is rapidly growing. However, existing approaches tend to address these tasks individually, making it difficult to holistically consider complex maritime situations. To address this limitation, we propose a novel framework, AIS-LLM, which integrates time-series AIS data with a large language model (LLM). AIS-LLM consists of a Time-Series Encoder for processing AIS sequences, an LLM-based Prompt Encoder, a Cross-Modality Alignment Module for semantic alignment between time-series data and textual prompts, and an LLM-based Multi-Task Decoder. This architecture enables the simultaneous execution of three key tasks: trajectory prediction, anomaly detection, and risk assessment of vessel collisions within a single end-to-end system. Experimental results demonstrate that AIS-LLM outperforms existing methods across individual tasks, validating its effectiveness. Furthermore, by integratively analyzing task outputs to generate situation summaries and briefings, AIS-LLM presents the potential for more intelligent and efficient maritime traffic management.


Unsupervised Port Berth Identification from Automatic Identification System Data

Hadjipieris, Andreas, Dimitriou, Neofytos, Arandjelović, Ognjen

arXiv.org Artificial Intelligence

Port berthing sites are regions of high interest for monitoring and optimizing port operations. Data sourced from the Automatic Identification System (AIS) can be superimposed on berths enabling their real-time monitoring and revealing long-term utilization patterns. Ultimately, insights from multiple berths can uncover bottlenecks, and lead to the optimization of the underlying supply chain of the port and beyond. However, publicly available documentation of port berths, even when available, is frequently incomplete - e.g. there may be missing berths or inaccuracies such as incorrect boundary boxes - necessitating a more robust, data-driven approach to port berth localization. In this context, we propose an unsupervised spatial modeling method that leverages AIS data clustering and hyperparameter optimization to identify berthing sites. Trained on one month of freely available AIS data and evaluated across ports of varying sizes, our models significantly outperform competing methods, achieving a mean Bhattacharyya distance of 0.85 when comparing Gaussian Mixture Models (GMMs) trained on separate data splits, compared to 13.56 for the best existing method. Qualitative comparison with satellite images and existing berth labels further supports the superiority of our method, revealing more precise berth boundaries and improved spatial resolution across diverse port environments.


AIS Data-Driven Maritime Monitoring Based on Transformer: A Comprehensive Review

Xie, Zhiye, Tu, Enmei, Fu, Xianping, Yuan, Guoliang, Han, Yi

arXiv.org Artificial Intelligence

With the increasing demands for safety, efficiency, and sustainability in global shipping, Automatic Identification System (AIS) data plays an increasingly important role in maritime monitoring. AIS data contains spatial-temporal variation patterns of vessels that hold significant research value in the marine domain. However, due to its massive scale, the full potential of AIS data has long remained untapped. With its powerful sequence modeling capabilities, particularly its ability to capture long-range dependencies and complex temporal dynamics, the Transformer model has emerged as an effective tool for processing AIS data. Therefore, this paper reviews the research on Transformer-based AIS data-driven maritime monitoring, providing a comprehensive overview of the current applications of Transformer models in the marine field. The focus is on Transformer-based trajectory prediction methods, behavior detection, and prediction techniques. Additionally, this paper collects and organizes publicly available AIS datasets from the reviewed papers, performing data filtering, cleaning, and statistical analysis. The statistical results reveal the operational characteristics of different vessel types, providing data support for further research on maritime monitoring tasks. Finally, we offer valuable suggestions for future research, identifying two promising research directions. Datasets are available at https://github.com/eyesofworld/Maritime-Monitoring.